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Abstract. Importance sampling is a promising variance reduction technique for Monte Carlo 
simulation based derivative pricing. Existing importance sampling methods are based on a 
parametric choice of the proposal. This article proposes an algorithm that estimates the 
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j_i optimal proposal nonparametrically using a multivariate frequency polygon estimator. In 

Oh 

contrast to parametric methods, nonparametric estimation allows for close approximation of 
^- the optimal proposal. Standard nonparametric importance sampling is inefficient for high- 

dimensional problems. We solve this issue by applying the procedure to a low-dimensional 
subspace, which is identified through principal component analysis and the concept of the 
effective dimension. The mean square error properties of the algorithm are investigated and 
its asymptotic optimality is shown. Quasi-Monte Carlo is used for further improvement of 
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the method. It is easy to implement, particularly it does not require any analytical computa- 
tion, and it is computationally very efficient. We demonstrate through path-dependent and 

> 

0> multi-asset option pricing problems that the algorithm leads to significant efficiency gains 
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i/-) compared to other algorithms in the literature. 
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1. INTRODUCTION 



In the last decade, the complexity of the pricing models used for evaluation of financial prod- 
ucts experienced a distinct increase. As a consequence of this development, pure numeri- 
cal methods became more and more inadequate for the arisen high-dimensional integration 
tasks. Often, Monte Carlo (MC) integration is the only feasible method. This stems from the 
fact that the MC convergence rate is independent of the problem dimension. However, crude 
MC is often inefficient for practical sample sizes. Raising computing power and increasing 
the sample size is no solution. The need of efficient MC procedure is apparent. 

To make MC algorithms comparable, it is useful to quantify the efficiency of an estimator. 
Let's assume X is a random variable defined on a probability space (17, B, P) and it is used to 
estimate some quantity fi. The computational efficiency (CE) of estimator X can be defined 
through 

CELY] = (MSELY1CLY])- 1 , 

where MSE[X] denotes the mean square error of X and C[X] the average costs of comput- 
ing one realization of X (L'Ecuyer 1994). From this definition one observes that efficiency 
improvements can be achieved either by reducing the MSE or the computational costs. 
The former includes well-known variance reduction (VR) techniques such as importance 
sampling (IS), antithetic sampling, moment matching, and control variates (Jackel 2002; 
Glasserman 2004; Robert and Casella 2004). Most VR techniques aim at improving a given 
set of samples, that is used for MC integration. In contrast, IS is based on changing the 
distribution from which the samples are drawn. The idea behind IS is to select a distribu- 
tion (which is known as proposal) that forces the samples into the domain which is most 
important to the integrand. Intuitively, this is particularly useful for derivatives that rely on 
rare events. A deep out-of-the money option is an obvious example for rare event depen- 
dency. Crude MC would only rarely produce samples which lead to non-zero payouts and, 
consequently, the MC variance would be severe. However, IS is by far not limited to rare 
event cases. Compared to other VR techniques the usage of IS is more involved, because 
the selection of a suitable proposal is generally difficult. But the additional effort is justified 
by the large potential of IS to reduce the MC variance. 

IS has been successfully applied to derivative pricing based on Gaussian proposals. That 
is, the proposal was chosen from some class of Gaussian distributions. An important ap- 
proach is based on a mean shift, which can be obtained through saddle point approxima- 
tion (Glasserman, Heidelberger, and Shahabuddin 1999), adaptive stochastic optimization 
(Vazquez-Abad and Dufresne 1998; Su and Fu 2000, 2002), or least squares (Capriotti 
2008). This approach is also known as the "change-of-drift technique". In addition, Gaussian 
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mixture distributions have been utilized for approximating the optimal proposal (Avramidis 
2002). Summarizing, existing approaches are based on parametric IS, that is the proposal 
is chosen from a certain class of distributions. For complex payouts it is hard to set up a 
class which contains a distribution that approximates the optimal proposal reasonably well. 

This paper proposes the usage of nonparametric IS (NIS) for derivative pricing. The basic 
idea of NIS is to use a nonparametric estimate of the optimal proposal. NIS algorithms 
have already been successfully applied to low-dimensional integration problems (Givens 
and Raftery 1996; Zhang 1996; Kim, Roh, and Lee 2000; Neddermeyer 2009). However, 
high-dimensional integration tasks have not been considered until now. As a result of the 
curse of dimensionality and computational limitations NIS cannot be applied directly to high- 
dimensional derivative pricing. We suggest to restrict NIS to those coordinates that are of 
most importance to the integration problem. This approach can be justified by the concept 
of the effective dimension (ED). To reduce the effective dimension and to identify the most 
relevant coordinates principal component analysis (PCA) is applied. 

The advantage of NIS compared to parametric IS is its close approximation of the optimal 
proposal. We prove that the VR factor of our nonparametric method increases with sam- 
ple size converging to the - in some sense - optimal value. Note, parametric IS methods 
achieve constant VR factors. It is shown through simulations that the proposed algorithm is 
computationally more efficient than parametric IS for well-known benchmark option pricing 
problems. In the case of low ED, the algorithm does not only outperform in terms of MSE 
but also in terms of computational costs. In other words, it is not only more accurate but 
also computationally cheaper. In addition, it is easy to implement based on the C++ im- 
plementation of the nonparametric estimator which is provided. NIS and most parametric 
IS methods share the property that they can be combined with other VR techniques. This 
is demonstrated through the use of low-discrepancy sequences (also known as quasi-MC 
(QMC)). 

This paper is organized as follows. In the next section, a general MC option pricing 
framework is introduced and IS is briefly reviewed. In Section 3, a nonparametric partial IS 
algorithm is proposed and its MSE convergence properties are investigated. The concept of 
the ED is reviewed in Section 4. In Section 5, path construction based on PCA is discussed, 
followed by a brief introduction to QMC (Section 6). A comparison to parametric IS is pre- 
sented in Section 7 and a detailed description of the implementation is given in Section 8. 
Finally, in Section 9 simulation results are reported followed by conclusions (Section 10). 



2 



2. DERIVATIVE PRICING AND IMPORTANCE SAMPLING 



Let's describe the evolution of the underlying asset through a stochastic differential equa- 
tion (SDE) of the form 

dS(t) = rS(t) dt + a{S{t))S{t)dW{t), (1) 

where W(t) is a standard Brownian motion (BM); r and a are the risk-free interest rate and 
the volatility, respectively. Within this model, evaluating the price of a European option with 
payout function C K {S), strike level K, and expiry T means computing 

E[exp(-rT)C K (S)), (2) 

where the expectation is taken with respect to the risk neutral measure. Except of special 
cases, there is no explicit solution for SDEs of kind 0. Therefore, it is required to migrate 
to some discretization S tk of the process S(t), which is defined on a discrete time grid 
= t < h < ■ ■ ■ < t d = T. The first order Euler discretization scheme yields 

S t k+1 = S tk + rS tk (t k+ i - t k ) + a{S tk )S tk ^ft k+ i - t k Z tk (3) 

with standard normal innovations Z tk . In the following, we focus on an equally-spaced time 
grid, i. e. U - U-\ = At = const. Based on this discretization, the option price (|2) can be 
approximated through the integral 

I<p— </'( x )p( x )^ x ) 

where </?(x) = exp(-rT)C^(5(x)). p denotes the density of the multivariate Gaussian distri- 
bution Af(0,I d ) with I d being the identity matrix of dimension d. By writing 5(x), it is meant 
that a trajectory of S tk is built up based on the innovations x = (xi, . . . ,x d ) T . To keep 
the discretization bias small, it is required to choose d considerably large which leads to a 
high-dimensional integration problem. The crude MC estimator is given by 

where x-^x 2 , . . . ,x. N are drawn from p. The estimator's large variance renders it impractical 
for approximating complex integrals. To construct a more efficient MC estimator, IS can be 
applied. The basic idea of IS is to sample from a proposal q instead of p. The IS estimator 
is defined through 

1 N 

8=1 
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with likelihood ratio «;(x) = p(x)/g(x) and samples x^x 2 ,...^^ drawn from g. For f' s to 
converge to I v , it is required that the support of q includes the support of \<p\p. Under the 
additional assumption \/ar q [<pw] < oo, a central limit theorem holds 

VN(P*-I v ) =>Af(0,of s ), 

where af Q = E q [<pw - I^] 2 (Rubinstein 1981). This result allows the construction of confi- 
dence intervals for I® and establishes the convergence rate 0{N~ 1 / 2 ). The optimal pro- 
posal, which minimizes the (asymptotic) variance of s , is given by 

Opt/ N = l^( X )IP( X ) 

q { > J>(x)|p(x)dx- 

Remarkably, the IS estimator based on the optimal proposal q opt has zero variance for func- 
tions ip with a definite sign. However, the optimal proposal is unavailable in practice because 
of its unknown denominator. 

3. NONPARAMETRIC PARTIAL IMPORTANCE SAMPLING 

In this section, nonparametric partial IS (NPIS) is introduced as a generalization of the 
NIS algorithm discussed in Zhang (1996) and Neddermeyer (2009). NIS is a two-stage 
procedure. In the first stage, the optimal proposal is estimated nonparametrically based on 
samples drawn from a trial distribution q . In the second stage, this nonparametric density 
estimate is used as proposal for IS. We pick up this approach, but instead of approximat- 
ing the optimal proposal in the entire space, we focus on the optimal proposal in a certain 
subspace. That is, the nonparametric IS procedure is restricted to a low-dimensional sub- 
problem in order to avoid the curse of dimensionality. We decompose x = (x„,x_„), where 
u c {1,2, .. .,d}. The cardinality of u is denoted by |u|. Let's consider the marginalized 
optimal proposal obtained through integration with respect to x „. It is given by 

q°?\x u ) = [ q°P\yL)dx- u . 

jRd-\u\ 

Subspace u is chosen such that it covers those coordinates which are most important to the 
integrand (see Section 4). To limit the computational burden of the nonparametric method, 
u will be considerably small in practice (1 < |u| < 3). In the NPIS algorithm, q opt is es- 
timated nonparametrically. Now, the choice of the nonparametric estimator is discussed. 
Typically, kernel estimators have been used within NIS algorithms. However, sampling and 
evaluation is computationally very inefficient for kernel estimators. In this article, a multi- 
variate variant of the nonparametric frequency polygon estimator is used, which is known 
as linear blend frequency polygon (LBFP) (Terrell 1983). The LBFP estimator attains the 
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same MSE convergence rate as kernel estimators, namely £>(iV~ 4 /( 4+d )), but it is com- 
putationally more efficient. The generation and the evaluation of N samples is of order 
0{2 d - 1 d 2 N^ d+b ">^ d+ ^) (Neddermeyer 2009), whereas kernel estimators require 0{dN 2 ) op- 
erations. The construction of an LBFP consists of interpolations of the bin mid-points of 
a multivariate histogram. Let's consider a multivariate histogram estimator with bin height 
fk u ...,k d for bin B ki,...,k d = Ut=i[tk i -h/2,t ki + h/2), where h is the bin width and (t kl , . . . ,t kd ) 
the bin mid-point. For x e fliLi > *fc* + h ) tne LBFP estimator is defined as 



/» = E 

ji,-Jd6{0,l} 



n( Xj t ki \ J ' / ^ Xj t ki ^ 



f k 1 +j 1 ,...,k d +j d - (4) 



In the one-dimensional case, / is just a linearly interpolated histogram. It can be shown that 
it integrates to one. 

Algorithm - Nonparametric Partial Importance Sampling 

Stage 1: Nonparametric estimation of the marginalized optimal proposal 

• Select subset u, bin width h, trial distribution q , and sample sizes M and N. 

• For j = 1, . . . , M: Generate sample x? ~ q . 

• Obtain nonparametric estimate <f pt of marginalized optimal proposal g opt 

oOPtfx ) - 

q {*u) - 1 M , 

where (J = \ip(^)\ P (^)q (^)- 1 and 

ji,...j H e{o,i} Lieu v / v / 

M 

X E ^ 1 n ieu [t ki+ji -h/2,t ki+ji +h/2) (x J ) 

i=i 

forx u G n ie «[*fc i .<fc i + '»)- 
Sfage 2/ Partial Importance Sampling 

• For i = 1, . . . , N: Generate samples ~ q op \x u ) and x l l u ~ p(x_ u ). 

• Evaluate 

i=i 

The following theorem investigates the MSE convergence properties of the NPIS algorithm 
to obtain the optimal value for bin width h. 
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Theorem 1 . Suppose that the assumptions given in Appendix A hold, if > 0, and p(x) 

p(x u )p(x- u ). We denote q = q opt . Then, we obtain fori^[' s (as defined in Appendix A) 



-p\tNPIS j 12 _ 1 



and the optimal bin width 



Kx)Mx_ u ) r 2 H g l x (1 + o(1)) ' 

v 1 3l«lM/iM I 



(5) 



where 



and 




l£«. 



{q o P ty< 



u(x) = ip(x)p(x u ) - J (p(x)p(x)dx- u . 



Proof. See Appendix A. 

The left and right term in the brackets in ^ can be interpreted as the variance caused 
by the components x_ u and x u , respectively. Note, subset u is chosen such that the left 
term is small compared to the right one. The expression in braces quantifies the MSE of the 
nonparametric estimate, which depends on both <f pt and trial distribution q . For h = /i opt 
and M/N -* A g (0, 1) (M, N oo) the theorem implies 

E[/NP'S_ /v] 2 = 1 / Kx)Mx- M ) rfx+0(jV _ (8+H)/(4+H)) ^ 

Hence, the variance caused by x u is of lower order. In other words, the optimal vari- 
ance (for partial IS on coordinates u) is achieved asymptotically. As a consequence, com- 
pared to crude MC and parametric IS techniques, NPIS is expected to yield increasing ef- 
ficiency as the sample size grows. Furthermore, if \u\ = d the MSE converges as fast 
as £)(jV-( 8+d )/( 4+(i )), which is a massive improvement compared to the standard MC rate 
0(N for d that is small (Zhang 1996; Neddermeyer 2009). Note, the results of this 
section also hold for distributions p other than the standard normal distribution. 

In this article, NPIS is only investigated for non-negative integrands. However, by decom- 
posing the payout function C = C + - C , NPIS can also be applied to financial derivatives 
that have both positive and negative payouts. 

4. EFFECTIVE DIMENSION 

The NPIS algorithm is based on the restriction on specific coordinates x u , where in high- 
dimensional integration problems \u\ < d. This approach can be justified by the concept 
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of the ED. It is well known, that many integration problems in financial engineering, despite 
having a large nominal dimension, are low-dimensional in terms of the ED. For a rigorous 
definition of the ED, let's consider the functional analysis of variance (ANOVA) decomposi- 
tion. Suppose / v?(x) 2 p(x)dx < oo and p(x) = nf=iP( x «) is a product density. Then, tp can 
be written as a sum of 2 d orthogonal functions 

^( x ) = < A*( x «)> 

uC{l,2,...,d} 

where the ANOVA functions ip u axe given recursively by 

^(x„) = / <p(x)p(x_ u )eZx_ u - V]^(x„). 

yR--i«i tEt 

Now, the fraction of the variance a 2 = Var p [^], that is explained by certain lower-dimensional 
ANOVA functions, is considered. For this purpose, the variance of (p u is defined by a 2 = 
J Rd ip u (x u ) 2 p(x.)dx, where a\ = 0. As the ANOVA decomposition is orthogonal, one has 
a 2 = Y<u a u- Hence, r„ = ^2 vCu a 2 can be interpreted as the contribution of x u to the 
total variance of ip. For a more detailed description of the ANOVA decomposition see, for 
instance, Takemura (1983) and Owen (1992). The following definition of the ED is due to 
Caflisch, Morokoff, and Owen (1997). 

Definition 1. The ED (in the truncation sense) is the cardinality of the smallest subset u 
such thatT u > ^a 2 with < 7 < 1. 

The threshold 7 is chosen close to one. In the framework of this article, we found 7 = 0.9 
reasonable. The ED does not only allow to identify those coordinates which most effect the 
integral value but it also indicates how many coordinates are required to cover a certain 
amount of the variance. An MC procedure that allows one to determine the ED of a given 
problem is described in Appendix B. 

5. GAUSSIAN MODELS 

The purpose of this section is to show how NPIS can be applied to models that are based 
on the integration with respect to high-dimensional Gaussian distributions. As mentioned 
before, NPIS is inefficient as a result of the curse of dimensionality unless the ED (and thus 
\u\) is small. For typical financial integration problems it is generally not advisable to apply 
NPIS with \u\ larger than 3, unless the number of paths to be sampled is huge or the domain 
of interest is very small (rare event case). 
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Suppose the task is to integrate with respect to Af(0, E). Now, PCA is used to transform 
the problem. The (positive-definite) covariance matrix E is written as 

S = VAV T , 

with A = diag(Ai, A 2 , . . . , A d ) and eigenvalues A*. The columns of V are the corresponding 
unit-length eigenvectors. Thus, one has VA 1/2 Z ~ AA(0,E) for Z ~ Af(0,I d ). Without loss 
of generality, it is assumed that the eigenvalues (and the corresponding eigenvectors) are 
sorted so that Ai > A 2 > • • • > A d . The PCA construction of samples from jV(0, E) is optimal 
in the sense that it provides an optimal lower-dimensional approximation (in the MSE sense) 
to the random variable of interest. This means that the first k components of Z explain as 
much as possible of the total variance. More precisely, it can be shown that they explain the 
fraction (Ai + A 2 + . . . + A fc )/(Ai + A 2 + . . . + A d ). 

The option pricing problem introduced in Section 2 leads to the construction of discretized 
BM paths based on samples from the multivariate Gaussian distribution. Paths are most 
easily build up through the random walk construction guided by ([3j. In this construction 
each component "counts roughly the same" rendering the restriction on a lower-dimensional 
subspace and hence the application of NPIS impractical. Note, the integral I v can be rewrit- 
ten as I v = J <£(x)pjV( 0i £)(x)dx, where E is the covariance matrix of the discretized BM with 
entries E^- = min{^,t,}. This suggests that PCA can be used to reduce the ED. The PCA 
construction of discretized BM paths has a continuous limit known as Karhounen-Loeve 
expansion of BM: 

oo 

Wit) = \f\ipi(t)Zi, < t < 1, 

1=1 

where ^(t) = y/2em{(i-0.5)irt}, Ai = {(i-0.5)vr}- 2 , and Zi ~AA(0,1) (Adler1990). Based 
on the expression for \, it is easily shown that Zi explains the fraction 2\ of the path's 
variability (which is approximately 81%, 9%, 3% for i = 1,2,3, respectively). These values 
are not only of asymptotic nature but also hold for a small number of discretization steps 
(with slight deviations). This astonishing result claims that very few PCA components suffice 
to determine most of the path's variation no matter how long or detailed it is. Particularly, 
the first PCA component plays a dominant role and has a nice geometrical interpretation. 
Roughly speaking, it determines the path's direction in the path space. This is visualized in 
Figure Q] 

Another common method for the reduction of the ED (of a discretized BM) is the Brownian 
Bridge technique. In this paper, the focus is on PCA because of its optimality property. 
However, it is remarked that in certain situations Brownian Bridge techniques are superior 
to PCA. This may especially be the case if the payout function only depends on the terminal 
value of the underlying. Note, NPIS can also be combined with Brownian Bridge techniques. 
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Figure 1 : Discretized BM paths: first PCA component varies whereas other components are fixed to random 
values (left); first PCA component is fixed and other components vary randomly (right). 

6. QUASI-MONTE CARLO INTEGRATION 

QMC is often used to (further) improve MC estimators. In contrast to MC, QMC integration 
uses so-called low-discrepancy sequences instead of (pseudo-) random numbers. Low- 
discrepancy numbers are constructed to fill the space more evenly. For a description of 
the construction of low-discrepancy sequences readers are referred to Niederreiter (1992), 
Glasserman (2004), and the references given there. The incentive to work with QMC is 
justified by its deterministic error bound of order 0(N~ 1 log d N), which follows from the 
well-known Koksma-Hlawka inequality (see Niederreiter (1992)). This bound is merely of 
theoretical benefit because the computation of the involved constants (including the Hardy- 
Krause variation of the integrand) is infeasible or at least very difficult. However, it suggests 
that QMC should massively outperform MC in low-dimensional integration problems. The 
advantage of QMC diminishes with increasing dimension. Nevertheless, it is well known in 
the financial engineering literature, that QMC may be effectively applied to high-dimensional 
problems (Paskov and Traub 1 995; Ninomiya and Tezuka 1 996; Traub and Werschulz 1 998). 
This stems from the fact mentioned earlier that many problems in finance have rather low ED 
compared to the nominal dimension. As the convergence properties of QMC become worse 
in higher dimensions, it is important to assign the first coordinates to the most relevant 
dimensions of the integration problem. In our setting, the relevant coordinates are those 
contained in u. 

A drawback of QMC is the lack of randomness, which impedes the computation of the 
MSE for assessing the accuracy of the estimator. This issue can be resolved by randomizing 
the deterministic low-discrepancy sequence to achieve independent realizations of the QMC 
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estimator. Different approaches for randomizing low-discrepancy sequences are available 
including Owen's scrambling (Owen 1995), random digit scrambling (Matousek 1998), or 
random shifts (see Okten and Eastman (2004) for a survey). In our simulations, priority 
is given to the random shift technique because of its straightforward implementation. The 
idea is to shift the entire sequence by a random vector v modulo one. v is drawn from the 
uniform distribution on [0, l) d . That is, a randomized sequence is obtained by substituting 
the quasi-random vectors y* of the original low-discrepancy sequence by (y* + v) mod 1. 

7. COMPARISON TO PARAMETRIC IMPORTANCE SAMPLING 

Until now, the application of IS in finance was limited to parametric IS. In particular, Gaus- 
sian or mixtures of Gaussian distributions have been applied. The variance of a parametric 
IS estimator with proposal qg (and parameter e@) can be written as 



where of s is defined as in Section 2. First, this suggests that, in contrast to NPIS, the VR 
factor is constant because all terms are C(iV _1 ). Second, the variance is critically affected 
by the tails of qg. Using Gaussian proposals, it is often hard to approximate the tails of 
q op[ reasonably well. There lies a distinct advantage of NIS methods. Most parametric IS 
approaches aim at choosing 9 so that |6) is minimized. We now discuss a variant of the 
least-squares IS (LSIS) algorithm (Capriotti 2008) which is directly comparable to NPIS. It 
is based on the Gaussian proposal N(n,ld) with parameter n e R d . Similar to NPIS, it is a 
two-stage algorithm. In the first stage, based on M samples from p, a least-squares problem 
is solved to estimate the optimal drift change (The variance can also be adjusted through 
this procedure.) However, as the problem dimension grows the estimate of fi becomes unre- 
liable. The variant of this algorithm which is suggested here applies LSIS to the coordinates 
x u , that are determined through PCA and the ED (analogous to NPIS). This makes the LSIS 
and the NPIS directly comparable. In Section 9, NPIS and this variant of LSIS are tested 
against each other through simulations. 

Besides the superior convergence properties NPIS has a computational advantage over 
parametric IS which is of relevance in practice. For computing the IS weights, parametric 
IS typically needs to evaluate the exp function which is very expensive. Through the use of 
the LBFP estimator, these evaluations are reduced in the NPIS algorithm. This leads to a 
relevant reduction of the computational costs (compare Section 9). 




(6) 
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Finally, we remark that combinations of parametric IS and NPIS are possible. For in- 
stance, while applying NPIS to x u one can carry out parametric IS on the remaining coordi- 
nates x_ u . 

8. IMPLEMENTATION OF THE ALGORITHM 

In this section, the details of practical implementation of the proposed NPIS algorithm is 
discussed. At first, an overview over the required ingredients for the implementation is given. 

8.1 OVERVIEW 

Subset u: This is chosen according to the ED (7 = 0.9), which can be computed with the 
algorithm given in Appendix B. If PCA is used, the first few principal components are 
selected. 

Trial distribution q : The choice of the trial distribution should be guided by the follow- 
ing two criteria: First, it should allow for efficient sampling and evaluation. Second, 
the marginal distributions of the coordinates contained in u should be overdispersed 
(heavy tailed) compared to the standard normal distribution. An all-purpose trial distri- 
bution, which we found to work well in practice, is discussed in Subsection 8.2. Alter- 
natively, one can use a parametric choice tailored to the specific integration problem or 
one can simply use the (multivariate) standard normal distribution. The latter is often 
not a good choice because of the importance of the tails of the proposal. 

Bin width h: A Gaussian reference rule can be computed in Stage 1 of the algorithm (for 
details see Subsection 8.3). 

Sample size M: For the simulations, we used M = max{256, 0.257V}. In the special case 
when |u| = d an optimal value for the proportion M/N can be derived (Zhang 1996; 
Neddermeyer 2009). 

LBFP estimator: The details of the implementation of the LBFP estimator can be found in 
Neddermeyer (2009). A C++ implementation of the LBFP as well as the R-package 
Ibfp are available on request. 

We emphasize that, in contrast to most parametric IS algorithms, all parameters are ad- 
justed automatically, such that no trial-and-error parameter selection and no analytical com- 
putation are necessary in practice. 
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8.2 TRIAL DISTRIBUTION 



As trial distribution we propose a simple product density. It is composed of a uniform 
distribution on [~pm,pm]^ and the multivariate Gaussian distribution p(x_ M ): 

g (x) = p(x_ u ) x (2 ^ m)H Y[M-PM,PM]( x i)> 

where p M is the (1 + (1 - e) 1 / M )/2-quantile of Af(0, 1). e > is very small, say e = 10~ 4 . 
Consequently, P(max 1 < i < M \Z { \ > p M ) = e holds for standard normal distributed Zj. This 
ensures that the bias caused by the bounded support of the uniform distribution is very 
small. In addition, the uniform distribution guaranties that the space of x u is well explored 
even for a small sample size. 

8.3 PRACTICAL BIN WIDTH SELECTION 

The expression for /i opt given in Theorem [T] is intractable analytically because of the un- 
known constants H x , H 2 . The plug-in method suggested in Zhang (1996) also seems un- 
suitable for our integration problem as derivatives of the integrand are required. We propose 
to apply a Gaussian approximation of Hi, H 2 . Suppose g opt is the density of a centered 
multivariate Gaussian distribution with covariance matrix diag(of ,<rf, . . . ,of u \). Under this 
assumption, it can be shown that 

2,880 ^ 1 w 

For the constant H 2 the mean of q opt plays the dominant role. Therefore, it is assumed that 
<7 opt is the density of A/"((mi,/x 2 , . • • ,Pd) T ',!<*)■ If the trial distribution is chosen as explained 
in the preceding subsection, one yields 

ff 2 « P 2exp£>?]. (8) 

In the algorithm, the expressions in and |8) can be approximated based on the sam- 
ples x 1 , . . . , x A/ . This follows from the fact, that the samples x 7 weighted with u j / YJk=i ^ k 
approximate g opt . 

9. SIMULATION RESULTS 

Different European option pricing scenarios are considered to compare the proposed al- 
gorithms (NPIS and the combination of NPIS and QMC (QNPIS)) with existing methods 
(crude MC, QMC, LSIS, and the combination of LSIS and QMC (QLSIS)). The performance 
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of the algorithms is measured through the VR factors (computed with respect to crude MC) 
and the relative computational efficiency (RCE). The RCE is defined as the ratio of the CE 
of the method of interest to the CE of crude MC. The computational costs are measured in 
seconds. All simulations are done for different sample sizes N in order to demonstrate the 
increasing VR factors of NPIS. 

Examples 1 through 3 consider different single- and multi-asset options within the stan- 
dard Black-Scholes model. There, the price of an asset S at time t is given by 

S(t) = S{0) exp[(r - 0.5a 2 )t + aVtZ] 

with standard normal random variable Z. The simulations are based on the following setting: 
5(0) = 100, a = 0.3, r = 0.05, and T = 1. In Example 4, the pricing of a cap within 
the CIR model is investigated to show the effectiveness of NPIS/QNPIS in a square-root 
diffusion model. For all algorithms, apart from crude MC, the PCA path construction is 
used. The parameters u, q , and h are chosen according to the description in the preceding 
section. Note, Theorem 1 does not apply to QMC sampling. We found empirically that 
QNPIS requires a larger bin width. In the simulations, 3/i opt is used. For LSIS and NPIS, M 
is set as suggested in the preceding section whereas for QNPIS and QLSIS M = 1024 is 
used throughout. The least squares estimates required in LSIS/QLSIS were computed with 
ten iterations of the Levenberg-Marquardt method (Press et al. 1992, pp. 683-688). 

The computations were carried out on a Dell Precision T3400, Intel CPU 2.83GHz. All 
algorithms were coded in C++. The Mersenne Twister 19937 (Matsumoto and Nishimura 
1 998) and the Sobol sequence (Sobol 1 967) were used for pseudo- and quasi-random num- 
ber generation, respectively. The Sobol sequence is randomized by the random shift tech- 
nique. The transformation of uniform random numbers into normal random numbers was 
done by the Beasley-Springer-Moro approximation (Mora 1995). 

Example 1 . Straddle Option 
The payout function of a straddle option is given by 

C K (S) = (S(T) - K)+ + (K- S(T)) + . 

In the Black-Scholes world the pricing of a straddle option is a one-dimensional integration 
problem with multi-modal optimal proposal. Gaussian proposals (such as drift changes) are 
severely inefficient for multi-modal payouts (Capriotti 2008). The optimal proposal and an 
LBFP estimate generated in the NPIS algorithm are shown in Figure [2] The LBFP estimate 
closely approximates the optimal proposal. To account for the bimodality, we used 2h op[ as 
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Figure 2: Standard normal distribution (dotted line), optimal proposal for a straddle option within the Black- 
Scholes model (dashed line), and an LBFP estimate of the optimal proposal (solid line). Model parameters: 
5(0) = 100, a = 0.3, r = 0.05, T = 1, and K = 100. 



Parameters 






VR (RCE) 




N 


if 


QMC 


LSIS 


NPIS 


QLSIS 


QNPIS 


2 io 


100 
110 


224 (380) 
253 (548) 


1.3 (0.4) 
1 (0.4) 


9 (0.9) 
6 (0.8) 


1,064(127) 
964 (150) 


2.3 xl0 b (8,469) 
3.2 xlO 5 (1.5 xlO 4 ) 


2 n 


100 
110 


264 (557) 
290 (532) 


1.3 (0.5) 
1 (0.3) 


13(1.7) 
8(1) 


1,361 (384) 
1,092 (291) 


2.6 xl0 b (2.1 xlO 4 ) 
3.1 xlO 5 (2.4 xlO 4 ) 


2 ia 


100 
110 


460 (941) 
505 (953) 


1.3 (0.4) 
1 (0.3) 


17 (2.2) 
11 (1.3) 


2,209 (1,006) 
2,201 (965) 


6.8 xl0 b (8.6 xlO 4 ) 
7.4 xlO 5 (8.7 xlO 4 ) 



Table 1 : The table reports the variance reduction (VR) factors and the relative computational efficiency (RCE) 
for a straddle option within the Black-Scholes model. Model parameters: S(0) = 100, a = 0.3, r = 0.05, T = 1, 
and \u\ = d= 1. All values are computed based on 1 ,000 independent runs. 

bin width in the QNPIS algorithm. However, 3fr opt gives only slightly worse results. The 
simulation results for the strikes K = 100 and K = 110 are reported in Table Q] First 
notice, that NPIS significantly outperforms LSIS because of the better approximation of the 
optimal proposal. Second, the VR factors for NPIS increase with sample size which agrees 
with Theorem 1. Third, the combination of NPIS and QMC leads to massive efficiency 
gains. Even after adjusting for the execution times the gains are enormous (see values for 
the RCE). Note, the increasing VR factors for QLSIS and QNPIS are a result of the QMC 
sampling. 



Example 2. Asian Options 
An arithmetic Asian call with payout function 



C K {S)=(^ d Y^S{t l )-K^ 
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is investigated. The optimal proposal is unimodal. This integration problem is well suited for 
NPIS/QNPIS because its ED is one. The strikes K = 100, 130, and 175 are considered. 
For strike K = 175 the option price is approximately 0.018 (for d =16) representing a rare 
event option pricing framework (which is still of practical interest). 

Table [2] shows the results for d = 16 and d = 64 discretization steps. The results of 
the Gaussian IS algorithm (GIS) based on saddle point approximation (Glasserman, Hei- 
delberger, and Shahabuddin 1999) are also reported. We emphasize that the VR and RCE 
increase with both strike level K and the sample size. The VR factors of GIS and LSIS 
are roughly constant. This coincides with the theoretical results. Particularly in the rare 
event cases, massive efficiency gains are achieved and NPIS/QNPIS improve significantly 
over their parametric competitors. In addition, the values for RCE establish that NPIS and 
QNPIS are computationally much more efficient than parametric IS strategies. In the table, 
missing values indicate that the trial stage sometimes failed to generate paths with positive 
payouts. To explain the result's dependency on the strike level, the marginalized optimal 
proposal (of the first PCA component) for different strikes were plotted (Figure [3). One can 
observe that both the mean and the variance of the marginalized optimal proposals alter with 
K. As a result of the shrinking variance (and the increasing skewness) of the marginalized 
optimal proposals, IS approaches based on pure drift changes become worse (relatively to 
NPIS/QNPIS) as K increases. 

Table [3] gives results for the case when the execution time is fixed such as in real-time 
application. The sample sizes were chosen so that all algorithms needed approximately the 
same time for execution. The values suggest that the variance of NPIS is roughly ten times 
smaller than those of existing IS techniques. 

In Table [4j the values for an Asian option with a knock-out feature are shown. The option 
will pay nothing if the arithmetic average exceeds the knock-out level K. The payout function 
is given by 

The evaluation of this option is a difficult task because the relevant domain is very narrow. 
The strike K = 140 and the knock-out levels K = 150 and K = 170 are considered. The 
EDs are two and one for K = 150 and K = 170, respectively. Both LSIS and NPIS have 
problems to generate paths with positive payouts in the trial stage (which is reflected in the 
missing values in Table|4). Again, QNPIS significantly improves over QLSIS. 
Finally, simulations for an Asian straddle option that pays 

1 d ^ d 

Ck(S) = (- £ S(U) -K) + + {K-- Y, S{U)) + 



15 



Figure 3: Standard normal distribution (dotted line), marginalized optimal proposal (of first principal compo- 
nent) for an Asian option with strike K = 60 (solid line), K = 100 (dashed line), and K — 140 (heavy line). 
Model parameters: 5(0) = 100, a = 0.3, r = 0.05, T = 1, and d= 16. 

are discussed. As for the standard straddle option, NPIS provides efficiency gains compared 
with LSIS (see Table[5}. Although, the VR factors and the RCE of QNPIS are large, they are 
much smaller than those obtained for the standard straddle option. 



In this example, multi-asset options are considered. Suppose one deals with s assets that 
satisfy 



where the correlation matrix of Zi,...,Z s is denoted by E. To keep the setting simple, 
Si(0) = 100 and corr(Zj, Zj) = 0.3 for i, j = 1, . . . , s, i ^ j is assumed. The ED is reduced 
by applying PCA to the correlation matrix. We investigate two different payout structures. 
First, the price for an average option with payout 



is computed. The second option depends on the maximum of the underlyings' final values 
and has the payout function 



From Table |6j one can observe that the results for the average option are qualitatively similar 
to those of the Asian option in Example 2. Particularly, the ED is also equal to one. 



Example 3. Multi-Asset Options 



Si(t) = Si(0) exp[(r - 0.5a 2 )t + aVtZ t ] i = l,...,s, 
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Parameters 



VR (RCE) 



N 


d 


K ED 


QMC 


GIS 


LSIS 


NPIS 


QLSIS 


QNPIS 


2 iu 


16 


100 1 


139 (139) 


10(3) 


9(2) 


21 (11) 


1,427 (113) 


859 (187) 






140 1 


17(17) 


55 (17) 


50 (10) 


200 (102) 


4,778 (375) 


5,462 (1,193) 






175 1 


2(2) 


683 (202) 


-(-) 


3,809 (1,941) 


4.3 xlO 4 (3,326) 


1.1 xlO 5 (2.5 xlO 4 ) 




64 


100 1 


145 (144) 


8(3) 


8(2) 


20 (11) 


1,409 (108) 


909 (224) 






140 1 


16 (16) 


61 (19) 


53 (10) 


245 (138) 


5,679 (434) 


7,428 (1,828) 






175 1 


2(2) 


902 (280) 


-(-) 


4,403 (2,501) 


5.8 xlO 4 (4,506) 


1.0 xlO 5 (2.5 xlO 4 ) 


2 11 


16 


100 1 


171 (173) 


9(3) 


9(2) 


28 (14) 


1 ,535 (226) 


908 (322) 






140 1 


21 (22) 


57(17) 


52 (10) 


285 (146) 


5,647 (823) 


6,443 (2,267) 






175 1 


3(3) 


680 (204) 


-(-) 


5,161 (2,646) 


4.5 xlO 4 (6,599) 


1.3 xlO 5 (4.4 xlO 4 ) 




64 


100 1 


185 (185) 


9(3) 


9(2) 


30 (17) 


1 ,583 (225) 


912 (360) 






140 1 


21 (16) 


69 (21) 


55 (10) 


329 (185) 


5,951 (847) 


8,027 (3,164) 






175 1 


2(2) 


1 ,072 (332) 


-(-) 


7,255 (4,117) 


6.2 xlO 4 (8,757) 


1.1 xlO 5 (4.4 xlO 4 ) 




16 


100 1 


339 (339) 


9(3) 


9(2) 


33 (17) 


2,549 (647) 


1 ,499 (767) 






140 1 


42 (43) 


56 (17) 


59 (12) 


324 (167) 


8,742 (2,219) 


1.0 xlO 4 (5,212) 






175 1 


5(5) 


756 (232) 


-(-) 


5,224 (2,696) 


8.7 xlO 4 (2.2 xlO 4 ) 


2.2 xlO 5 (1.1 xlO 5 ) 




64 


100 1 


354 (352) 


10(3) 


10(2) 


35 (20) 


2,743 (682) 


1,627 (921) 






140 1 


36 (36) 


68 (21) 


57 (11) 


369 (209) 


9,685 (2,407) 


1.3 xlO 4 (7,388) 






175 1 


4(4) 


1,031 (318) 


-(-) 


7,414 (4,198) 


9.7 xlO 4 (2.4 xlO 4 ) 


1.8 xlO 5 (1.0 xlO 5 ) 



Table 2: The table reports the variance reduction (VR) factors, the relative computational efficiency (RCE), and 
the effective dimension (ED) for an Asian option within the Black-Scholes model. Model parameters: 5(0) = 
100, a = 0.3, r = 0.05, T = 1. All values are computed based on 1,000 independent runs. 











VR (TV) 






Time 


ED 


MC 


GIS 




LSIS 


NPIS 


0.35 
0.7 
1.4 


1 
1 
1 


1 (2 ia ) 
1 (2 14 ) 
1 (2 15 ) 


16 (L2 11 
16 (L2 12 
17(L2 l;i 


1H J) 12 
19 J) 11 
19 J) 11 


L2 lu ' es J) 

L211.68J) 
L2 12.6 8J) 


168 {2 12 ) 
175 (2 13 ) 
158 (2 14 ) 



Table 3: The table reports the variance reduction (VR) factors, the sample sizes (N), and the effective dimen- 
sion (ED) for an Asian option within the Black-Scholes model. The execution time is fixed to 0.35, 0.7, and 1.4 
seconds, respectively. The sample sizes are chosen such that all algorithms approximately achieved the fixed 
execution time. Model parameters: 5(0) = 100, a = 0.3, r = 0.05, T = 1, K = 140, and d = 16. All values are 
computed based on 1 ,000 independent runs. 



Parameters VR (RCE) 



N 


K 


ED 


QMC LSIS 


NPIS 


QLSIS 


QNPIS 


2 iu 


150 


2 


5 (5) - (-) 


-(-) 


69 (5) 


110 (21) 




170 


1 


16(16) -(-) 


37(19) 


1 ,003 (80) 


1 ,362 (297) 


2 n 


150 


2 


6 (6) - (-) 


-(-) 


68 (10) 


123 (36) 




170 


1 


18(18) -(-) 


134 (68) 


1,168(171) 


1,613 (568) 




150 


2 


6 (6) - (-) 


-(-) 


82 (21) 


163 (63) 




170 


1 


23 (24) - (-) 


106 (55) 


1 ,530 (394) 


1,883 (961) 



Table 4: The table reports the variance reduction (VR) factors, the relative computational efficiency (RCE), 
and the effective dimension (ED) for an Asian option with a knock-out feature within the Black-Scholes model. 
Model parameters: 5(0) = 100, a = 0.3, r = 0.05, T = 1, K = 140, and d = 16. All values are computed based 
on 1,000 independent runs. 
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Parameters 



VR (RCE) 



N 


d 


ED QMC 


LSIS 


NPIS 


QLSIS 


QNPIS 


2 1U 


16 


1 193(199) 


1.2 (0.2) 


6(3) 


300 (24) 


323 (71) 




64 


1 213(214) 


1.1 (0.2) 


6(4) 


321 (25) 


361 (90) 


2 n 


16 


1 225 (233) 


1.2 (0.2) 


8(4) 


359 (53) 


418 (151) 




64 


1 256 (249) 


1.2 (0.2) 


9(5) 


397 (57) 


410 (164) 




16 


1 425 (440) 


1.2 (0.2) 


10(5) 


634 (165) 


711 (372) 




64 


1 454 (455) 


1.2 (0.2) 


11 (6) 


715 (179) 


717 (406) 



Table 5: The table reports the variance reduction (VR) factors, the relative computational efficiency (RCE), 
and the estimated effective dimension (ED) for an Asian straddle option within the Black-Scholes model. Model 
parameters: 5(0) = 100, a = 0.3, r = 0.05, T = 1, and K = 100. All values are computed based on 1,000 
independent runs. 



Parameters VR (RCE) 



N 


K 


ED QMC 


LSIS 


NPIS 


QLSIS 


QNPIS 


2 iu 


100 


1 1 79 (346) 


9(4) 


24 (24) 


4,048 (620) 


3,315 (1,384) 




140 


1 19(38) 


43 (16) 


212 (210) 


5,171 (788) 


6,269 (2,612) 




175 


1 2(3) 


-H 


3,277 (3,229) 


2.5 xlO 4 (3,856) 


4.5 xlO 4 (1.9 xlO 4 ) 


2 n 


100 


1 212(409) 


9(3) 


34 (33) 


4,249 (1,197) 


3,677 (2,475) 




140 


1 26 (50) 


48 (18) 


338 (333) 


5,438 (1,533) 


6,932 (4,697) 




175 


1 2(4) 


-H 


4,637 (4,571) 


2.9 xlO 4 (8,313) 


4.9 xlO 4 (3.3 xlO 4 ) 




100 


1 428 (830) 


9(3) 


49 (48) 


4,872 (2,403) 


3,996 (3,948) 




140 


1 49 (96) 


52 (20) 


372 (368) 


6,373 (3,157) 


7,720 (7,630) 




175 


1 4(9) 


-(-) 


5,953 (5,857) 


4.3 xlO 4 (2.1 xlO 4 ) 


6.5 xlO 4 (6.4 xlO 4 ) 



Table 6: The table reports the variance reduction (VR) factors, the relative computational efficiency (RCE), and 
the estimated effective dimension (ED) for a multi-asset average option within the Black-Scholes model. Model 
parameters: 5;(0) = 100 (i = 1, . . ., s), a = 0.3, r = 0.05, T = 1, and s = 16. All values are computed based 
on 1,000 independent runs. 



The results for the second option with strikes K = 1 50 and K = 200 are reported in 
Tables [7] and [8j respectively. The pricing of the second option is a difficult problem because 
the ED is equal to the nominal dimension. Although, for K = 200 QNPIS is superior to QMC 
and QLSIS for s = 2, 3, and 4 (in terms of the VR factors), for K = 1 50 this only holds for s = 
2 and 3. We emphasize on the massive efficiency gains obtained by QNPIS for strike K = 
200. For s > 2 the sample size used was too small for NPIS to perform well. We conclude 
that the applicability of NPIS/QNPIS depends not only on the ED of the problem but also on 
the sample size used. An LBFP estimate of the optimal proposal for the case s = 2 is plotted 
in Figure [4} Here, the PCA construction leads to a bimodal optimal proposal which can be 
closely approximated though an LBFP. 

Example 4. Cap in the CIR model 

Finally, we consider the CIR interest rate model (Cox, Ingersoll, and Ross 1985). Here, 
interest rate r t follows a square-root diffusion model 

dr t = k(6 - r t )dt + a^/ndWt. 
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Figure 4: LBFP estimate of the optimal proposal for the multi-asset max option with strike K — 150. Model 
parameters: Si(0) = 100 (i = 1,2), a = 0.3, r = 0.05, T= 1, and |w| = 2. 



Parameters 








VR (RCE) 




N s 


ED 


QMC 


LSIS 


NPIS 


QLSIS 


QNPIS 


2 n 2 
3 
4 


2 
3 
4 


44 (68) 
26 (56) 
24 (41) 


6(1.8) 
3(1.2) 
3(1) 


10 (0.6) 
0.3 (0.02) 
0.03 (0.002) 


1 45 (33) 
52 (14) 
28 (7) 


3,070 (228) 
72 (7) 
7 (0.5) 


2 ia 2 

3 
4 


2 
3 
4 


76 (136) 
42 (79) 
27 (45) 


5(1.9) 
3(1.2) 
3(1.1) 


25 (1.6) 
0.3 (0.02) 
0.05 (0.002) 


213 (91) 
72 (32) 
36 (16) 


5,848 (620) 
148(16) 
8 (0.7) 


2 i 3 2 

3 
4 


2 
3 
4 


211 (391) 
70(119) 
35 (60) 


6(2) 
3(1.1) 
3(1) 


61 (4) 
2 (0.08) 
0.01 (0.001) 


396 (270) 
95 (65) 
42 (30) 


4.7 xlO 4 (5,916) 
161 (20) 
7 (0.7) 



Table 7: The table reports the variance reduction (VR) factors, the relative computational efficiency (RCE), and 
the estimated effective dimension (ED) for a multi-asset max option with strike K = 150. Model parameters: 
Si(0) = 100 (i = 1, . .. ,s), a = 0.3, r = 0.05, and T = 1. All values are computed based on 1,000 independent 
runs. 



Parameters 








VR (RCE) 




N s 


ED 


QMC 


LSIS 


NPIS 


QLSIS 


QNPIS 


2 n 2 
3 
4 


2 
3 
4 


8(14) 
4(8) 
5(8) 


-(-) 
2 (0.6) 
0.8 (0.3) 


49 (3) 
0.2 (0.01) 
0.1 (0.006) 


65 (17) 
17(4) 
9(2) 


7,997 (652) 
165(14) 
20 (1.5) 


3 
4 


2 
3 
4 


13 (23) 
7(13) 
6(9) 


-(-) 
4(1.5) 
4(1.3) 


163 (10) 
3 (0.2) 
0.2 (0.01) 


82 (33) 
22 (10) 
11 (4.7) 


1.8 xlO 4 (1,837) 
259 (28) 
24 (2.2) 


2 i 3 2 

3 
4 


2 
3 
4 


32 (58) 
11 (19) 
8(13) 


-(-) 
5(1.7) 
4(1.5) 


304 (18) 
1 .2 (0.06) 
0.05 (0.002) 


95 (65) 
28 (19) 
13 (9.1) 


1.1 xlO 5 (1.4 xlO 4 ) 
292 (36) 
27 (2.8) 



Table 8: The table reports the variance reduction (VR) factors, the relative computational efficiency (RCE), and 
the estimated effective dimension (ED) for a multi-asset max option with strike K = 200. Model parameters: 
Si(0) = 100 {i = l,...,s), a = 0.3, r = 0.05, and T = 1. All values are computed based on 1,000 independent 
runs. 
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Parameters 






VR (RCE) 




N T 


K 


ED QMC 


LSIS 


Nrlb 


r-\\ O 1 


QNPIS 


oil * 

2 1 


.05 




? (0. 41 


0.7 (0.4) 


396 (58) 






.06 


1 271 (2801 


3 (0.6) 


3 (1.4) 


798 (119) 


? 8?8 (1 0211 




.0/ 


1 (?361 


9 (1 81 


12 (6) 




256 (91 1 




.08 


1 9 (9) 


51 (10) 


36 (19) 


458 (67) 


219 (78) 


o 
d 


.05 


1 232 (2351 


3 (0.5) 


1 .<L (O.b) 


/I OC /~7^ \ 

4ob (/I) 


2 754 (9771 




.Ob 


1 297 (2981 


5 (11 


4 (2) 


8<d0 (1<i0) 


2 555 (9051 




.07 


1 240 (247) 


10 (1 .9) 


13 (7) 


281 (42) 


288 (1 041 




no 

.08 


1 25 (25) 


25 (51 


11 (6) 


300 (44) 


157 (561 


2 1 


.05 


1 479 (4891 


2 (0 41 


1 .1 (0.6) 


820 (210) 


5 235 (2 7171 




.06 


1 58? (5881 


3 (0.6) 


4 (2) 


1 ,621 (414) 


4 961 (2 0811 




.07 






H O /~7\ 

13 (7) 


onn i a\ r\r\\ 

388 (100) 


M 74^ 




.08 


1 15(15) 


49 (10) 


43 (22) 


588 (151) 


283 (1461 


2 


.05 


1 484 (492) 


2 (0.4) 


1.9 (1) 


1,007 (257) 


5,723 (2,957) 




.06 


1 626 (634) 


5 (0.9) 


6(3) 


1 ,377 (352) 


4,182 (2,143) 




.07 


1 422 (433) 


9(1.9) 


14(7) 


375 (97) 


360 (188) 




.08 


1 44 (45) 


26 (5) 


18(9) 


374 (96) 


214 (111) 



Table 9: The table reports the variance reduction (VR) factors, the relative computational efficiency (RCE), and 
the estimated effective dimension (ED) for a cap within the CIR model. Model parameters: r = 0.07 , 9 = 0.075, 
k = 0.2, a = 0.02, and d = 16. All values are computed based on 1 ,000 independent runs. 



The first order Euler discretization yields 

r tk+1 = r tk + k(0 - r tk )At + a^Z tk , 

with Z tk ~ 7V(0, 1) and At = T/d. The aim is to evaluate the price of an interest rate cap. It 
pays (r th - K) + at time t k+x (k = 0, . . . , d - 1) subject to strike K. The discounted payout is 
given by 

d-l i 

^exp[-A^ n J(r tfc -*Q + . 

i=0 j=0 

The parameter values used in the simulations are d = 16, r = 0.07, 9 = 0.075, k = 0.2, a = 
0.02, T = 1 and 2, K = 0.06, 0.07, and 0.08. The results are reported in Table [9] and Table 
[10} Again the ED is equal to one, which explains the good performance of NPIS/QNPIS. In 
particular, QNPIS strongly outperforms QLSIS for small strike levels. 



10. CONCLUSION 

An NPIS algorithm was proposed that applies NIS to a carefully chosen subspace. The 
MSE convergence properties were explored. They establish the asymptotic optimality of the 
approach and suggest that it improves over parametric IS - at least - asymptotically. In partic- 
ular, NPIS is shown to achieve increasing efficiency compared to crude MC and parametric 
IS. Its usefulness for practical sample sizes was verified through well-known option pricing 
scenarios. Large VR factors were obtained in certain situations. It was shown that NPIS is 
advantageous over existing IS methods for problems with low ED, which is often the case in 
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Parameters 



VR (RCE) 



N T 


K 


ED QMC 


LSIS 


NPIS 


QLSIS 


QNPIS 


2 11 1 


.05 


1 238 (239) 


3 (0.5) 


0.9 (0.5) 


451 (65) 


2,924 (1,164) 




.06 


1 284 (284) 


4 (0.7) 


3(1.8) 


940 (135) 


4,225 (1.677) 




.07 


1 263 (263) 


10(2) 


15(8) 


414 (59) 


335 (133) 




.08 


1 9(9) 


48 (9) 


30 (17) 


536 (77) 


336 (133) 


2 


.05 


1 240 (239) 


3 (0.5) 


1 .4 (0.8) 


552 (79) 


3,760 (1,483) 




.06 


1 309 (308) 


7(1.2) 


5(3) 


1,013 (145) 


3,345 (1,325) 




.07 


1 270 (269) 


11 (2) 


15(8) 


410 (58) 


402 (158) 




.08 


1 28 (27) 


25 (5) 


21 (12) 


411 (59) 


162 (64) 


2 rz 1 


.05 


1 471 (472) 


2 (0.4) 


1.3 (0.7) 


870 (218) 


7,202 (4,101) 




.06 


1 571 (571) 


3 (0.6) 


5(3) 


1,808 (453) 


7,422 (4,219) 




r\~7 
.0/ 


H A C\-\ / ACk-i\ 

1 4yl (4yl) 


10 (2) 


16 (9) 


COO MOO\ 

bed (1 oo) 


A "7C ZO"7H \ 

4/0 ) 




.08 


1 17(17) 


43 (8) 


34 (19) 


674 (168) 


346 (196) 


2 


.05 


1 477 (474) 


2 (0.4) 


2(1.2) 


1 ,074 (266) 


9,622 (5,442) 




.06 


1 627 (624) 


6(1.1) 


7(4) 


1,371 (342) 


5,875 (3,328) 




.07 


1 507 (503) 


11 (2) 


16(9) 


522 (130) 


531 (299) 




.08 


1 51 (51) 


24 (5) 


15(9) 


470 (117) 


177(100) 



Table 10: The table reports the variance reduction (VR) factors, the relative computational efficiency (RCE), 
and the estimated effective dimension (ED) for a cap within the CIR model. Model parameters: r = 0.07 , 
8 = 0.075, k = 0.2, a = 0.02, and d = 64. All values are computed based on 1,000 independent runs. 



finance. Particularly, situations of rare event dependency or multi-modal optimal proposals 
are well suited for NPIS. There, existing methods often fail. The combination of NPIS and 
QMC resulted in enormous efficiency gains. A detailed description of the implementation of 
the proposed algorithm was provided. It is emphasized that NPIS can be applied "blindly". 
No analytical investigation of the payout function is required. In addition, being generally ap- 
plicable NPIS is not restricted to a specific kind of diffusion model or payout function. It can 
be applied to other settings occurring in finance, such as the estimation of option sensitivities 
or the evaluation of the value-at-risk. 



APPENDIX A: PROOF OF THEOREM 1 

Prerequisites for Theorem [1] The following quantities are not required in practical appli- 
cation. However, they are necessary for the proof of Theorem [j] Let A M be an increasing 
sequence of compact sets defined by A M = {x e RH : <f pt (x) > c M }, where c M > and 
cm — ► as M goes to infinity. For any function g, we denote the restriction of g on A M by gu- 
Furthermore, the volume of A M is denoted by V M - The NPIS estimator J^p'S j S obtained by 
substituting q opi (in the algorithm) for 



Qm ( x «) 



else. 



Assumption 1: q opl has three continuous and square integrable derivatives on its support 
and it is bounded. In addition, J(vV pt ) 4 (<f pt r 3 < 00 where V 2 <f pt = d 2 q opX /d* 2 1 + ...+ 
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d 2 <f p 7dxd- 

Assumption 2: Trial distribution q is chosen such that E[q opi q 1 ] 4 is finite on supp(<f pt ). 

Assumption 3: Sample sizes M,N — > oo, bin width /i satisfies h — ► and M/il u l — > oo. 

Additionally, we have 5 A/ > 0, Va/^m = o{h 2 ) and M 3 (Vm<5m) 4 -» oo. 

Assumption 4: c M is chosen such that ^MJ^Tl = o( ft*+yi)-^ and m + (mH)-' _ Q 

Assumption 5: The sequence c A/ guaranties (/ if pt l {(f pt <CM }) 2 = o(M _1 /i 4 + (M 2 /^)" 1 ). 

Proof of Theorem [7| Conditional on the samples {x\x 2 , . . . ,x A/ }, the variance of I^j s 
can be written as 

o| 1% f / ¥>(x)p(xu) opt, \ 2 p(x_ u ) 



i/[^ + {^ W -# W } S 



9m ( x «) 



dx 



? < ■ ^ 

with i/(x) = (p(x.)p(x u ) - J ip(x.)p(x.)dx- u - The right term in the brackets (quantifying the 
nonparametric estimation error) can be treated analogous to the proof of Theorem 2 in 
Neddermeyer (2009). However, as a result of the integration with respect to x „, a different 
variance term is obtained. The optimal bin width is derived through differentiation. 

APPENDIX B: DETERMINATION OF THE EFFECTIVE DIMENSION 

An MC procedure to determine the ED is discussed (Wang and Fang 2003). It can be 
shown that the cumulated variances satisfy 



where both x and y are vectors in R d . Hence, the ED can be computed based on the 
approximations 

l 1 

r u = jYl y-«) ~4 (* = i, 2, . . . , d) 



1 , , 



and 



i=i 



with L = l//^(/?(x i ). The samples x 1 , x 2 , . . . , x', y 1 , y 2 , . . . , y' are drawn from p. 
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